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Abstract 

We consider a setting in which we have a treatment and a large number of covari- 
ates for a set of observations, and wish to model their relationship with an outcome of 
interest. We propose a simple method for modeling interactions between the treatment 
and covariates. The idea is to modify the covariate in a simple way, and then fit a stan- 
dard model using the modified covariates and no main effects. We show that coupled 
with an efficiency augmentation procedure, this method produces valid inferences in a 
variety of settings. It can be useful for personalized medicine: determining from a large 
set of biomarkers the subset of patients that can potentially benefit from a treatment. 
We apply the method to both simulated datasets and gene expression studies of cancer. 
The modified data can be used for other purposes, for example large scale hypothesis 
testing for determining which of a set of covariates interact with a treatment variable. 

1 Introduction 

To develop strategies for personalized medicine, it is important to identify the treatment 



and c ovariate interactions in the setting of randomized clinical trial [Royston and Sauerbrei 



2008]. To confirm and quantify the treatment effect is often the primary objective of a 
randomized clinical trial. Although important, the final result (positive or negative) of a 
randomized trial is a conclusion with respect to the average treatment effect on the entire 
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study population. For example, a treatment may be no better than the placebo in the overall 
study population, but it may be better for a subset of patients. Identifying the treatment 
and covariate interactions may provide valuable information for determining this subgroup 
of patients. 

In practice, there are two commonly used approaches to characterize the potential treat- 
ment and covariate interactions. First, a panel of simple patient subgroup analyses, where 
the treatment and control arms are compared in different patients subgroups defined a pri- 
ori, such as male, female, diabetic and non-diabetic patients, may be performed following 
the main comparison. Such an exploratory approach mainly focusses on simple interac- 
tions between treatment and one dichotomized covariate. However it will often suffer from 
false positive findings due to multiple testing and will not find complicated treatment and 
covariates interaction. 

In a more rigorous analytic approach, the treatment and covariates interactions can be 
examined in a multivariate regression analysis where the product of the binary treatment 
indicator and a set of baseline covariates are included in the regression model. Recent break- 
throughs in biotechnology makes a vast amount of data available for exploring for potential 
interaction effect with the treatment and assisting in the optimal treatment selection for 
individual patients. However, it is very difficult to detect the interactions between treatment 
and high dimensional covariates via direct multivariate regression modeling. Appropriate 
variable selection methods such as Lasso are needed to reduce the number of covariates hav- 
ing interaction with the treatment. The presence of main effect, which often have bigger 
effect on the outcome than the treatment interactions, further compounds the difficulties in 
dimension reduction since a subset of variables need to be selected for modeling the main 
effect as well. 

Recently, iBonetti and Gelberl 2004j formalized the subpopulation treatment effect pat- 
tern plot (S TEPP) for characteriz ing interactions between the treatment and continuous 



covariates. ISauerbrei et al.l [20071 ] proposed a efficient algorithm for multivariate model- 



building with flexible fractional polynomia ls interactions (MFPI) and compared the em- 
pirical performance of MFPI with STEPP. Su et al. 2008| proposed the classification and 
regre ssion tree method to explor e the covariates and treatment interactions in survival anal- 



ysis. iTian and Tibshiranil [2010| proposed an efficient algorithm to construct an index score, 



the sum of selected dichotomized covariates, to stratify patien ts population according to 
the treatment effect. In a more recent work, IZhao et al.l 2012| proposed a novel approach 
to directly estimate the optimal treatment selection rule via maximizing the expected clin- 
ical utility, which is equivalent to a weighted classification problem. There are also rich 
Bayesian literatures for flexible mo deling nonlinear and nonadditive /inter a ction relationship 
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However, most of these existing methods excepting that proposed by 
are not designed to deal with high-dimensional covariates. 
In this paper, we propose a simple approach to estimate the covariates and treatment 
interactions without the need for modeling main effects. The idea is simple, and in a sense, 
obvious. We simply code the treatment variable as ±1 and then include the products of this 
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Overall, pvalue= 0.343 



low gene score, pvalue= 0.393 



high gene score, pvalue= 0.032 




Figure 1: Example of the modified covariate approach, applied to gene expression data from 
multiple myeloma patients who were given one of two treatments in a randomized trial. Our 
procedure constructed a gene score based on 20 genes, to detect gene expression- treatment 
interactions. The numerical score was constructed on a training set, and then categorized 
into low, medium and high. The panels show the survival curves for a separate test set, 
overall and stratified by the score. 

variable with centered versions of each covariate in the regression model. 

Figure [1] gives a preview of the results of our method. The data consist of gene ex- 
pression measurements from multiple myeloma patients, who were randomized to one of 
two treatments. Our proposed method constructs a numerical gene score on a training set 
to reveal gene expression- treatment interactions. The panels show the estimated survival 
curves for patients in a separate test set, overall and stratified by the score. Although there 
is no significant survival difference between the treatments overall, we see that patients with 
medium and high gene scores have better survival with treatment PS341 than those with 
Doxyrubicin. 

In section [2, we describe the methods for continuous, binary as well as survival type 
of outcomes. We also establish a simple casual interpretation of the proposed method in 
several cases. In section 3, the finite sample performance of the proposed method has been 
investigated via extensive numerical study. In section 4, we apply the proposed method 
to a real data example about the Tamoxifen treatment for breast cancer patients. Finally, 
potential extensions and applications of the method were discussed in section 5. 
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2 The proposed method 



In the following, we let T = ±1 be the binary treatment indicator and and be the 
potential outcome if the patient received treatment T = 1 and —1, respectively. We only 
observe Y = Y^ T \ T and Z, a q— dimensional baseline covariate vector. We assume that 
the observed data consist of N independent and identically distributed copies of (Y,T, Z), 
{(Yj, Tj, Zj), % = 1, • • • , N}. Furthermore, we let W(-) : R q — > R p be a p dimensional functions 
of baseline covariates Z and always include an intercept. We denote W(Zj) by Wj in the 
rest of the paper. Here the dimension of Wj could be large relative to the sample size N. 
For simplicity, we assume that Prob(T = 1) = Prob(T = —1) = 1/2. 

2.1 Continuous response model 

When Y is continuous response, a simple multivariate linear regression model for character- 
izing the interaction between treatment and covariates is 

Y = #W(Z) + y W(Z) • T/2 + e, (1) 

where e is the mean zero random error. In this simple model, the interaction term 7qW(Z)-T 
models the heterogeneous treatment effect across the population and the linear combination 
of 7qW(Z) can be used for identifying the subgroup of patients who may or may not be 
benefited from the treatment. Specifically, under model (0Q), we have 

A(z) = E(F (1) -Y { - 1) \Z = z) 

= E(r|T = l,Z = z)-E(F|T = -l,Z = z) 
= 7oW(z), 

i.e., 7qW(z) measures the causal treatment effect for patients with baseline covariate Z. With 
observed data, 7 can be estimated along with {3q via the ordinary least squares method. 
On the other hand, noting the relationship that 

E(2FT|Z = z) = A(z), 

one may estimate 7 by directly minimizing 

N 

iV- 1 ^(2F,T,-7 / W i ) 2 . (2) 

i=i 

We call this the modified outcome method, where 2YT can be viewed as the modified out- 
come, which has been first proposed in Ph.D thesis of James Sinovitch, Harvard University. 

Under the simple linear model ([T]), both estimators are consistent for 7 , and the full 
least squares approach in general is more efficient than the modified outcome method. In 
practice, the simple multivariate linear regression model often is just a working model ap- 
proximating the complicated underlying probabilistic relationship between the treatment, 
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baseline covariates and outcome variables. It comes as a surprise, that even when model ([I]) 
is misspecified, multivariate linear regression and modified outcome estimators still converge 
to the same deterministic limit 7* and furthermore W(z)'7* is still a sensible estimator for 
the interaction effect in the sense that it seeks the "best" function of z in a functional space 
T to approximate A(z) by solving the optimization problem: 

minE{A(Z)-/(Z)} 2 , 

subject to / G 7 = {yW(z)|7 G R p }, 
where the expectation is with respect to Z. 



2.2 The Modified Covariate Method 

The modified outcomes estimator defined above is useful for the Gaussian case, but does not 
generalize easily to more complicated models. Hence we propose a new estimator which is 
equivalent to the modified outcomes approach in the Gaussian case and extends easily to 
other models. This is the main proposal of this paper. 
We consider the simple working model 

W(Z) ■ T 

r = tto + Yo 2 +e, (3) 

where e is the mean zero random error. Based on model ([3]), we propose the modified 
covariate estimator 7 as the minimizer of 

i=l v 7 

The fact that we can directly estimate 7 in model (J3J) without considering the intercept a 
is due to the orthogonality between W(Zj) • Tj and the intercept, which is the consequence 
of the randomization. That is, we simply multiply each component of W s by one-half the 
treatment assignment indicator (= ±1) and perform a regular linear regression. Now since 

t=l J i=l 

the modified outcome and modified covariate estimates are identical and share the same 
causal interpretation for the simple Gaussian model. Operationally, we can omit the intercept 
and perform a simple linear regression with the modified covariates. In general, we proposed 
the following modified covariate approach 
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1. Modify the covariate 

Zi -> Wi = W(Zi) ->• W* = W, • T,/2 

2. Perform appropriate regression 

^~7oW* (5) 

based on the modified observations 

(W*, Y{) = {(W r Ti)/2, Yj}, i = l,2,...N. (6) 

3. 7 W(z) can be used to stratify patients for individualized treatment selection. 



Figure |2] illustrates how the modified covariate method works for a single covariate Z, in 
two treatment groups. The raw data is shown the left, and the data with modified covariate 
is shown on the right. The slope of the regression line computed in the right panel estimates 
the treatment-covariate interaction. 

The advantage of this new approach is twofold: it avoids having to directly model the 
main effects and it has a causal interpretation for the resulting estimator regardless of the 
adequacy of the assumed working model ([3]). Furthermore, unlike modified outcome method, 
it is straightforward to generalize the new approach to other types of outcome. 

2.3 Binary Responses 

When Y is a binary response, in the same spirit as the continuous outcome case, we propose 
to fit a multivariate logistic regression model with modified covariates W* = W(Z) • T/2 
generalized from (J5j): 

Prob(y = i|z.r)= exp(7 ° w '> . (7) 

Noting that if model (0) is correctly specified, then 

A(z) = Prob(T (1) = 1|Z = z) - Prob(y (_1) = 1|Z = z) 

= Prob(F = 1\T = 1, Z = z) - Prob(F = 1\T = -1, Z = z) 
exp{ 7 [ ) W(z)/2} - 1 
exp{V W(z)/2} + l' 

and thus 7qW(z) has an appropriate causal interpretation. However, even when model (j7]) 
is not correctly specified, we still can estimate 7 by treating (J7J) as a working model. 

In general, the maximum likelihood estimator (MLE) of the working model, converges 
to a deterministic limit 7* and W(z)'7*/2 can be viewed as the solution to the following 
optimization problem 

niax/E [Yf(Z)T - log(l + e /{z)T )} 



6 




Figure 2: Example of the modified covariate approach. The raw data is shown the left, 
consisting of a single covariate Z and a treatment T = — 1 or 1. The treatment- covariate 
interaction has slope 7 approximately equal to 1. On the right panel we have plotted the 
response against Z ■ T/2. The the regression line computed in the right panel estimates the 
treatment effect for each give value of covariate Z. 
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subject to / G 7 = {yW(z)/2|7 G W}, 

where the expectation is with respect to (Y,T, Z). Therefore, where W(z) forms a "rich" set 

of basis functions, W(z)'7*/2 is an approximation to the minimizer of E |y/(Z)T — log(l + e^ z ) T )j . 

In the appendix, we show that the latter can be represented as 



/» = log 



l-A(z) 
1 + A(z) 



under very general assumptions. Therefore, 

exp{7'W(z)/2} 



A(z) 



exp{7'W(z)/2} + 1 



may serve as an estimate for the covariate-specific treatment effect and used to stratify 
patients population, regardless of the validity the working model assumptions. 

As described above, the MLE from the working model (J7|) can always be used to construct 
a surrogate to the personalized treatment effect measured by the "risk difference" 

A(z) = E(Y« -Y(~V\Z = z). 

On the other hand, different measures for individualized treatment effects such as relative 
risk may also be of interest. For example, if we consider an alternative approach for fitting 
the logistic regression working model (J7J) by letting 

n 

7 = argmax 7 |(1 - Y^'W* - l^T^'} , 

i=l 

then 7 converges to a deterministic limit 7* an d W(z)'7*(z)/2 can be viewed as an approx- 
imation to log{A(z)}, where 

~ , . Prob(T« = 1IZ = z) 
A z - 



Prob(r(- 1 ) = 1|Z = z) 



which measures the treatment effect based on "relative risk" rather than "risk difference". 
The detailed justification is given in the Appendix 6.1. 



2.4 Survival Responses 

When the outcome variable is survival time, we often do not observe the exact outcome for 
every subject in a clinical study due to incomplete follow-up. In this case, we assume that 
the outcome Y is a pair of random variables (X, 5) = {X A C, I(X < C)}, where X is the 
survival time of primary interest, C is the censoring time and 5 is the censoring indicator. 
Firstly, we propose to fit a Cox regression model 

A(t|Z,T) = A (t)e^' w * (8) 
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where A(i|-) is the hazard function for survival time X and Aq(-) is a baseline hazard function 
free of Z and T. When model (|S]) is correctly specified, 



A(z) 



log 



E{A (X( 1 ))|Z = z} 
E{A (X(-D)|Z = z} 

E{A (X)|T=l,Z = z} 



E{A (X)|T = -1,Z 
exp{- 7 ;W(z)} 



and 7qW(z) can be used to stratify patient population according to A(z), where A (t) = 
j \o(u)du is a monotone increasing function (the baseline cumulative hazard function). 
Under the proportional hazards assumption, the maximum partial likelihood estimator 7 is 
a consistent estimator for 7 and semiparametric efficient. Moreover, even when model (jSj) 
is misspecified, we still can "estimate" 7 by maximizing the partial likelihood function. In 
general, the resulting estima tor, 7, conve r ges to a deterministic limit 7*, which is the root 
of a limiting score equation Lin and Weil . Il989| . More generally, W(z)'7*/2 can be viewed 
as the solution of the optimization problem 



maxE 

/ 



N 



f(Z)T-\og{J2e fiz)T I(X>u)} 
3=1 



dN{u) 



subject to / e J 7 = {7'W(z)/2|7 e R p }, 

where N(t) = I(X < t)5i and the expectation is with respect to (Y,T,Z). Therefore, 
W(z)'7*/2 can be viewed as an approximation to 



/*(z) = argmaXfE 



N 



f(Z)T-\og{J2e fiz)T I(X>u)} 



dN(u). 



In appendix 6.1, we shown that the minimizer /* satisfies 



e r(z) E{A*(X (1) )|Z = z} - e- / *WE{A*(X (_1) )|Z = z} = E(A (1) |Z = z) - E(A ( - 1} |Z = z) 

for a monotone increasing function A* (it). Thus, when censoring rates are balanced between 
two arms, 



/* (z) w~log 



E{A*(lM)|Z = z} 
E{A*(IH))|Z = z} 



can be used for characterizing the covariate-specific treatment effect and stratifying the 
patient population even when the working model (jSj) is misspecified. 
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2.5 Regularization for high dimensional data 



When the dimension of W*, p, is high, we can easily apply appropriate variable selection 
procedures based on th e corresponding w orking model. For example, L\ penalized (Lasso) 
estimators proposed by iTibshiranil 1996j can be directly applied to the modified data ([6]). 
In general, one may estimate 7 by minimizing 



.1 n v 

-^^ywD + Ao^lTyi, 



(9) 



where 



*\2 



{K,yW*-log(l + eVwi)} 
7'W* - log{Ef =1 e^U{X 3 > X,)} 



A, 



for continuous response 
for binary response 
for survival response 



It might be reasonable to suppose that the covariates interacting with the treatment will 
more likely be the ones exhibiting impor tant main effects themselves. Therefore, one could 
also apply the adaptive Lasso procedure jZoul . |2006| with feature weights Wj proportional to 
the reciprocal of the univariate "association strength" between the outcome Y and the jth 
component of W(Z). Specifically, one may modify the penalty in (Q as 



3=1 



(10) 



where Wn 



or 



\6\i\) \ where 9j 1 , Oj(-i) and dj, are the estimated regression 



coefficients of the jth component of W(Z) in appropriate univariate regression analysis with 
observations from the group T = 1 only, from the group T = — 1 only, and from both 
groups, respectively. Other regularization methods such as elastic net may also be used 



Zou and Hastie. 2005 



Interestingly, one can treat the modified data <Q just as generic data and hence couple 
it with other statistical learning techniques. For example, one can apply a classifier such 
as prediction analysis of microarrays (PAM) to the modified data for the purpose of finding 
subgroup of samples in which the treatment effect is large. We also can do large scale 
hypothesis testing on the modified data to determine which gene-treatment interactions 
have a significant effect on the outcome. 



2.6 Efficiency Augmentation 

When the models ([3J El and E|) with modified covariates is correctly specified, the MLE 
estimator for 7* is the most efficient estimator asymptotically. However, when models are 
treated as working models subject to mis-specification, a more efficient estimator can be 
obtained for estimating the same deterministic limit 7*. To this end, noting the fact that 
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in general 7 is defined as the minimizer of an objective function motivated from a working 
model: 

1 N 

7 = argmin 7 -^/(y;,yw;) (11) 

i=i 

Noting that for any function a(z) : R q — > RP, E{Ti&(Zi)} = due to randomization, the 
minimizer of the augmented objective function 

1 N 

i=l 

converges to the same limit as 7, when N — > 00. Furthermore, by selecting an optimal 
augmentation term a (-), the minimizer of the augmented objective function can have smaller 
variance than that of the minimizer of the original objective function. 
In appendix 6.2, we show that 

ao(z) = ~W(z)E(y|Z = z) 

and 

a (z) = -iw(z){E(F|Z = z) - 0.5} 

are optimal choices for continuous and binary responses, respectively. Therefore, we proposed 
the following two-step procedures for estimating 7* : 
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1. Estimate the optimal a (z) : 

(a) For continuous response, fit the linear regression model E(Y\Z) = ^'B(Z) for 
appropriate function B{Z) with OLS. Appropriate regularization will be used if 
the dimension of B{Z) is high. Let 

a(z) = -iw(z) x i'B(z). 

(b) For binary response, fit the logistic regression model logit{Prob(y = 1|Z)} = 
C,'B(Z) for appropriate function B(Z) by maximizing the likelihood function. Ap- 
propriate regularization will be used if the dimension of B(Z) is high. Let 

1 { et' B ^ 1 
a(z) = — W(z) x <^ — 

y ' 2 y ! yi + e i'B{ Z ) 2 

Here B{z) = {Bi(z), • • • , B s (z)} and B k {z) : R q — >■ R 1 is selected basis function. 

2. Estimate 7* 

(a) For continuous response, we minimize 

N 

X 



with appropriate regularization if needed, 
(b) For binary response, we minimize 

1 N 

Jj E [-WW - log(l + e^ w ?)} - 7 / a(Z,)T J 



N 

i=i 



with appropriate regularization if needed. 



For survival outcome, the log-partial likelihood function is not a simple sum of i.i.d terms. 
However, in Appendix 6.2 we show that the optimal choice of a(z) is 



-W(z){G 1 (r;z) + G' 2 (r;z)}- / R(u){G 1 (du;z) - G 2 (du;z)} 



ao(z) = -- 



where Gi(u;z) = £{M(w)|Z = z,T = 1}, G 2 («;z) = £{M(u)|Z = z,T = -1}, 



M(t,W*, 7 *) = JV(i) - f 

Jo 



I{X > u)ef' w *dE{N{u)} 
o E{e^' w */(X > «)} 
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and 



R(«;t*) = 



E{W*e^' w *I(X > u)} 



E{e 7 ' w */(X > u)} ' 

Unfortunately, ao(z) depends on the unknown parameter 7*. On the other hand, on high- 
dimensional case, the interaction effect is usually small and it is not unreasonable to assume 
that 7* ~ 0. Furthermore, if the censoring patterns are similar in both arms, we have 
Gi(u,z) G 2 (u, z). Using these two approximations, we can simplify the optimal augmen- 
tation term as 

ao(z) = -iw(z) {Gx(r; z) + G 2 (r; z)} = -±W(z) x E{M(r)|Z = z) 



where 



M(t) = N(t) - [ 
Jo 



I(X > u)dE{N(u)} 



E{I(X > u)} 

Therefore, we propose to employ the following approach for implementing the efficiency aug- 
mentation procedure,: 



1. Calculate 



Mi(r) = Ni(r) - 



-j(x,> M )4EL^H} 



for i — 1, • • • , N and fit the linear regression model E(M(t)\Z) = £'.B(Z) for appropri- 
ate function B(Z) with OLS and appropriate regularization if needed. Let 



a(z) = -iw(z) x i'B(z). 



2. Estimate 7* by minimizing 



1 N 

-Y 

i=i 



7 'W*-log{f;e^7(^>^)} 

j'=i 



A, - 7 , a(Z,)T J 



with appropriate penalization if needed. 



Remarks 1 

When the response is continuous, the efficient augmentation estimator is the minimizer 

of 



N 

E 

N 



1 



1 



- V i — -7'W(Z l )T,/2 - V^Z^T, 



If I ') 

g [ Y i - ? B ( Z i) - ^'W^T, j + constant. 
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This equivalence implies that this efficiency augmentation procedures is asymptotically 
equivalent to that based on a simple multivariate regression with main effect £'i?(Zj) and 
interaction 'y'W(Z) • T. This is not a surprise. As we pointed out in section 2.1, the choice 
of the main effect in the linear regression does not affect the asymptotical consistency of 
estimating the interactions. On the other hand, a good choice of main effect model can help 
to estimate the interaction, i.e., personalized treatment effect, more accurately. 

Another consequence is that one may directly use the same algorithm solving standard 
optimization problem to obtain the augmented estimator when lasso penalty is used. For 
binary or survival response, the augmented estimator under lasso regularization can be ob- 
tained with slightly modified algorithm designed for lasso optimization as well. The detailed 
algorithm is given in the appendix 6.3. 

Remarks 2 

For nonlinear models such as logistic and Cox regressions, the augmentation method 
is NOT equivalent to the full regression approach including main effect and interaction 
terms. In those cases, different specification of the main effects in the regression model result 
in asymptotically different estimates for the interaction term, which, unlike the proposed 
modified covariate estimator, in general can not be interpreted as the personalized treatment 
effect. 



Remarks 3 

With binary response, the estimating equation targeting on approximating the relative 
risk is 

N 

^wafi-^-y^*} 

1=1 

and the optimal augmentation term ao(z) can be be approximated by 

-~W(z){E(y|Z = z)-l} 
when 7* w 0. The efficiency augmentation algorithm can be carried out accordingly. 



Remarks 4 

The simi lar technique can also be used for improving other estimators such as that 



proposed by lZhao et al.l [20121 ]. where the surrogate objective function for the weighted mis- 
classification error can be written in the form of ( 12. 6 p as well. The optimal function a (z) 
needs to be derived case by case. 
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3 Numerical Studies 



In this section, we perform an extensive numerical study to investigate the finite sample 
performance of proposed method in various settings: the treatment may or may not have 
marginal main effect between two groups; the personalized treatment effect may depend 
on complicated function of covariates such as interactions among covariates; the regression 
model for detecting the interaction may or may not be correctly specified. Due to the 
limitation of the space, we only present simulation results from the selected representive 
cases. The results for other scenarios are similar to those presented. 

3.1 Continuous responses 

For continuous responses, we generated N independent Gaussian samples from the regression 
model 

v v 

3=1 3=1 

where the covariate (Zi, • • • , Z p ) follows a mean zero multivariate normal distribution with 
a compound symmetric variance-covariance matrix, (1 — p)I p + pl'l, and e ~ N(0, 1). We let 
(7i, 72, 7s, 74, 75, • • • , 7p) = (1/2, -1/2, 1/2, -1/2, 0, • • • , 0), a = y/2, N = 100, and p = 50 
and 1000 representing high and low dimensional cases, respectively. The treatment T was 
generated as ±1 with equal probability at random. We consider four sets of simulations: 

1. (3j = <j< 10)/4 and p = 0; 

2. = (-l)^ +1 /(3 < j < 10)/4 and p = 0.5; 

3. = (-iy +1 /(3 < j < 10)/2 and p = 0; 

4. (3j = (-iy +1 I(3 < j < 10)/2 and p = 0.5. 

Settings 1 and 2 presents relative moderate main effect, where the variability in response 
contributable to the main effect is the same as that to the interaction. Settings 3 and 4 
represent relative big main effect, where the variability in response contributable to the 
main effect is twice as big as that to the interaction. For each of the simulated data set, we 
implemented three methods: 

• full regression: The first method is to fit a multivariate linear regression with full main 
effect and covariate/treatment interaction terms, i.e., the dimension of the covariate 
matrix was 2{p + 1). The Lasso was used to select the variables. 

• new: The second method is to fit a multivariate linear regression with the modified 
covariate W* = (1, Z)'-T/2 as the covariates, i.e., the dimension of the covariate matrix 
is p + 1. Again, the Lasso is used for selecting variables having treatment interaction. 
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• new/ augmented: the proposed method with efficiency augmentation, where E(Y|Z) is 
estimated with lasso-regularized ordinary least squared method and B(z) = z. 

For all three methods, we selected the Lasso penalty parameter via 20-fold cross-validation. 
To evaluate the performance of the resulting score measuring the individualized treatment 
effect, we estimated the Spearman's rank correlation coefficient between the estimated score 
and the "true" treatment effect 



in an independently generated set with a sample size of 10000. Based on 500 sets of sim- 
ulations, we plotted the boxplots of the rank correlation coefficients between the estimated 
scores -y'Z and A(Z) under simulation settings (1), (2), (3) and (4) in top left, top right, 
bottom left and bottom right panels of Figure [31 respectively. When the main effect is mod- 
erate and covariates are independent (setting 1), the performance of the proposed method 
is better than that of the full regression approach. However, when the main effect is rel- 
atively big compared to interactions (settings 3 and 4), the proposed method is unable to 
estimate the correct individualized treatment effect well and is actually inferior to the simple 
regression method. On the other hand, the performance of the "new/augmented" is the best 
or nears the best across all the four settings and is sometimes substantially better than its 
competitors. 

3.2 Binary responses 

For binary responses, we used the same simulation design as that for the continuous response. 
Specifically, we generated N independent binary samples from the regression model 



where all the model parameters were the same as those in the case of continuous response. 
Noting that the logistic regression model is misspecified under the chosen simulation design. 
We also considered the same four settings with different combinations of (3j and p. For each 
of the simulated data set, we implemented three methods: 

1. full regression: The first method is to fit a multivariate logistic regression with full main 
effect and covariate/treatment interaction terms, i.e., the dimension of the covariate 
matrix was 2(p + 1). The Lasso was used to select the variables. 

2. new: The second method is to fit a multivariate logistic regression (without intercept) 
with the modified covariate W* = (1, Z)' ■ T/2 as the covariates. Again, the Lasso was 
used for selecting variables having treatment interaction. 

3. new/augmented: the proposed method with efficiency augmentation, where E(Y|Z) is 
estimated with Lasso-penalized logistic regression. 



A(Z) = E(F (1) - F ( - 1} |Z) = (Z x -Z 2 + Z 3 - Z A ) 



Y 




(13) 
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To evaluate the performance of the resulting score measuring the individualized treatment 
effect, we estimated the Spearman's rank correlation coefficient between the estimated score 
and the "true" treatment effect 

A(z) = E(y( 1) -y(- 1 )|z) 



O"0 / V °"° 

where $ was the cumulative distribution function of standard normal. Although the scores 
measuring the interaction from the first and second/third methods were different even when 
the sample size goes to infinity, the rank correlation coefficients put them on the same footing 
in comparing performances. 

In top left, top right, bottom left and bottom right panels of Figure HJ we plotted the 
boxplots of the correlation coefficients between the estimated scores y Z and A(Z) under 
simulation settings (1), (2), (3) and (4), respectively. The patterns are similar to that for the 
continuous response. The "new/augmented method" performed the best or close to the best 
in all the four settings. The efficiency gain of the augmented method in setting 4 where the 
main effect was relative big and covariates were correlated was more significant than that in 
other settings. 

In additional simulation study, we also evaluated the empirical performance of the gen- 
eralized modified covariate approach with nearest shrunken centroid classifier. In one set 
of the simulation, the binary response is simulated from model (fTBl) with p = 50, n = 200, 
f3j = I(j < 20)/2, 7j- = I(j < 4)/2 and o"o = a/2. Here the first four predict ors have covariate 



treat ment interaction. We applied the nearest shrunken centroid classifier Tibshirani et al. 



2001] to the modified data, (6) with the shrinkage parameter selected via 10 fold cross- 
validation. This produced a posterior probability estimator for {Y = 1}. We then applied 
this estimated posterior probability interaction score, to a independently generated test set 
of size 400. We dichotomized the observations in the test set into high and low score groups 
according to the median value and calculated the differences between two treatment arms in 
high and low score groups separately. With 100 replications, the boxplots of the differences 
in high and low score groups were shown in the right panel of Figure [5j For comparison 
purposes, the empirical differences between two arms in high and low score groups deter- 
mined by the true interaction score YTj=\ Ij^j were shown in the left panel of figure [5j It can 
be seen that modified covariate approach, coupled with nearest shrunken centroid classifier, 
provided reasonable stratification for differentiating the treatment effect. 

3.3 Survival Responses 

For survival responses, we used the same simulation design as that for the continuous and 
binary responses. Specifically, we generated N independent survival time from the regression 
model 

(p v \ 

^/3^, + ^ 7 ,^T + ao-e , (14) 
3=1 3=1 / 
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where all the model parameters were the same as in the previous subsections. The censoring 
time was generated from uniform distribution £7(0, £ )> where £ was selected to induce 25% 
censoring rate. For each of the simulated data set, we implemented three methods: 

1. full regression: The first method was to fit a multivariate Cox regression with full main 
effect and covariate/treatment interaction terms, i.e., the dimension of the covariate 
matrix was 2p + 2. The Lasso was used to select the variables 

2. new: The second method was to fit a multivariate Cox regression with the modified 
covariate W* = (1, Z)' ■ T/2 as the covariates. Again, the Lasso was used for selecting 
variables having treatment interaction. 

3. new/augmented: the proposed method with efficiency augmentation. To model the 
E{M(r)|Z}, we used linear regression with lasso regularization method. 

To evaluate the performance of the resulting score measuring the individualized treatment 
effect, we estimated the Spearman's rank correlation coefficient between the estimated score 
and the "true" treatment effect based on survival probability at to — 5 

A(Z) = Prob(X {1) > t \Z) - Prob(X (_1) > t \Z) 

g f E? = i(ft + J^i ~ MfrA g / Ej = i(ft ~ n)Zj - log(tp) 

y o"o J \ co 

In top left, top right, bottom left and bottom right panels of Figure El we plotted the 
boxplots of the correlation coefficients between the estimated scores 'y'Z and A(Z) under 
simulation settings, (1), (2), (3) and (4), respectively. The patterns were similar to those 
for the continuous and binary responses and confirmed our findings that the "efficiency- 
augmented method" performed the best among the three methods in general. 



4 Examples 



It has been known that the breast cancer can be classified into different subtypes using 
gene express ion profile and t he effective treatment may be different for different subtypes of 



the disease [Loi et all l2007l |. In this section, we apply the proposed method to study the 
potential interactions between gene expression levels and Tamoxifen treatment in the breast 
cancer patients. 



The data set consists of 414 patients in the cohort GSE6532 collected by lLoi et al.l [2007 



for the purpose of characterizing ER-positive subtypes with gene expression profiles. The 
dataset including demographic information and gene expression levels can be downloaded 
from the website www.ncbi.nlm.nih. gov/geo/query/acc.cgi?acc=GSE6532. Excluding pa- 
tients with incomplete information, there are 268 and 125 patients receiving Tamoxifen and 
alternative treatments, respectively. In addition to the routine demographic information, we 
have 44, 928 gene expression measurements for each of the 393 patients. The outcome of 
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the primary interest here is the distant metastasis free survival time, which subjects to right 
censoring due to incomplete follow-up. The metastasis free survival times in two treatment 
groups are not statistically different with a two-sided p value of 0.59 based on the log-rank 
test (Figure [7]). The goal of the analysis is to construct a score using gene expression levels 
for identifying subgroup of patients who may or may not be benefited from the Tamoxifen 
treatment in terms of the distant metastasis free survival. To this end, we select the first 90 
patients in the Tamxifen arm and an equal number of patients in the alternative treatment 
arm to form the training set and reserve the rest 213 patients as the independent validation 
set. In selecting the training and validation sets, we use the original order of the observations 
in the dataset without additional sorting to ensure an objective analysis. 

We first identify 5,000 genes with highest empirical variances and then construct an 
interaction score by fitting the Lasso penalized Cox regression model with modified covariates 
based on the 5,000 genes in the training set. The Lasso penalty parameter is selected via 
20-fold cross-validation. The resulting interaction score is a linear combination of expression 
levels of seven genes. Here, a low interaction score favors Tamoxifen treatment. We apply 
the gene score to classify the patients in the validation set into high and low score groups 
according to if her score is greater than the median level. In the high score group, the distant 
metastasis free survival time in the Tamoxifen group is shorter than that in the alternative 
group with an estimated hazard ratio of 3.52 for Tamoxifen versus non- Tamoxifen treatment 
group (logrank test p = 0.064). In the low score group, the distant metastasis free survival 
time in the Tamoxifen group is longer than that in the alternative group with an estimated 
hazard ratio of 0.694 (p = 0.421). The estimated survival functions of both treatment groups 
are plotted in the upper panels of Figure El The interaction between constructed gene score 
and treatment is statistically significant in the multivariate Cox regression based on the 
validation set (p = 0.004). 

Furthermore, we implement the efficiency augmentation method and obtain a new score, 
which is based on expression level of eight genes. Again, we classify the patients in the 
validation set into high and low score groups based on the constructed gene score. In the 
high score group, the distant metastasis free survival time in the Tamoxifen group is shorter 
than that in the alternative group with a p value of 0.158. The estimated hazard ratio is 2.29 
for Tamoxifen versus non- Tamoxifen treatment group. In the low score group, the distant 
metastasis free survival time in the Tamoxifen group is longer than that in the alternative 
group with an estimated hazard ratio of 0.828. The p value from the logrank test is not 
significant (p = 0.697). The estimated survival functions of both treatment groups are 
plotted in the middle panels of Figure El The separation is slightly worse than that based 
the gene score constructed without augmentation. The interaction between constructed gene 
score and treatment is also statistically significant at 0.05 level (p = 0.025). 

For comparison purpose, we also fit a multivariate Cox regression model with treatment, 
the gene expression levels, and all treatment-gene interactions as the covariates. Lasso 
penalty is selected via 20-fold cross validation. The resulting gene score is a single gene 
based on the estimated treatment-gene interaction term of the Cox model. However, the 
interaction score fails to stratify the population according to the treatment effect in the 
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validation set. The results are shown in the lower panel of Figure |BJ The interaction between 
the constructed gene score and treatment is not statistically significant (p = 0.29). 

To further objectively examine the performance of the proposal in this data set, we 
randomly split the data into training and validation sets and construct the score mea- 
suring individualized treatment effect in the training sets with three methods: "new", 
"new/augmented" and "full regression". Patients in the test set are then stratified into 
high and low score groups. We calculate the difference in log hazard ratio for Tamoxifen 
versus non- Tamoxifen treatment between high and low score groups. A positive number in- 
dicates that women in low score group benefitted more from Tamoxifen treatment than those 
in high score group as the model indicates. In Figure [9j we plot the boxplot of the differences 
in the log hazard ratio based on 100 random splitting. To speed the computation, all scores 
are constructed using only 2500 genes with top empirical variances. The results indicate that 
the proposed and the corresponding augmented methods tend to perform better than the 
common full regression method and this observation is consistent with our previous findings 
based on simulation studies. 

As a limitation of this example, the treatment is not randomly assigned to the patients 
as in a standard randomized clinical trial. Therefore, the results need to be interpreted with 
caution. In addition, the sample size is limited and further verification of the constructed 
gene score with independent data sets is needed. 



5 Discussion 



In this paper we have proposed a simple method to explore the potential interactions between 
treatment and a set of high dimensional covariates. The general idea is to use W(Z) -T/2 as 
new covariates in a regression or generalized regression model to predict the outcome variable. 
The resulting linear combination / y'W(Z) is then used to stratify the patient population. A 
simple efficiency augmentation procedure can be used to improve the performance of the 
method. 

The proposed method can be used in much broader way. For example, after creating the 
modified covariates W(Z) ■ T/2, other data mining techniques such as PAM a nd support 



vector machines can also be used to link the new covariates with the outcomes Friedman 



19911 . iTibshirani et al.l . lHastie and Zhul . 120061 ] . Most dimension reduction methods in the 
literature can be readily adapted to handle the potentially high dimensional covariates. For 
univariate analysis, we also may perform large scale hypothesis testing on the modified data, 
to identify a list of covariates having interaction with the treatment : one could for exam ple 
directly use the Significance Analysis of Microarrays (SAM) method Gilbert et al.l . |2002| for 
this purpose. Extensions in these directions are promising and warrant further research. 

Lastly, the proposed method can also be used to analyze data from observational studies. 
However, the constructed interaction score may lose the corresponding causal interpreta- 
tion. On the other hand, if a reasonable propensity score model is available, then we still 
can implement the modified covariate approach on matched or re weighted data such that 



the resulted score still retains the appropriate causal interpretation [Rosenbaum and Rubin 
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6 Appendix 

6.1 Justification of the objective function based on the working 
model 

Under the linear working model for continuous response, we have 




and 



E{l(Y, /(Z)T)|Z, T=-l}= 1 - [E{(Y^)) 2 |Z} + 2m_ 1 (Z)/(Z) + /(Z) 2 ] , 



where m t (z) = E(Y^|Z = z) for t — 1 and -1. Therefore 
£(/) =E{/(Y,/(Z)T)} 

=E Z \E Y {l(Y, f(Z)T)\Z,T = 1} + \e y {1(Y, f(Z)T)\Z,T = -1} 



=E Z I -{mi(Z) - m_i(Z)} - /(Z) J + constant. 
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Therefore, the minimizer of this objective function 

r(z) = 1 -{m 1 (z)-m_ 1 (z)}= 1 -A(z) 

for all z G Support of Z. 



Under the logistic working model for binary response, we have 

E{l(Y, /(Z)T)|Z, T = 1} = mx(Z)/(Z) - log(l + e'< z >), 

and 



E{Z(y, /(Z)T)|Z, T = —1} = -m_x(Z)/(Z) - log(l + e"'< z >). 



Thus 



£(/) =E{/(F,/(Z)T)} 
=Ez 



iE y {/(F, /(Z)T)|Z,T = 1} + ±E y {i(y, /(Z)T)|Z,T = -1} 



= 1e z [A(Z)/(Z) - log(l + e* z >) - log(l + e-W)] . 



Therefore 



dC(f) 1 



df 2 

which implies that the minimizer of £(/) 



A(Z)- 



1 - 

1 + e/( z ) 



r(z) = io g 



1 - A(z) 
1 + A(z) 



for all z G Support of Z or equivalently 



A(z) = 



Alternatively, under the logistic working model with binary response, we may focus on 
the objective function 



Therefore 
and 



l(Y, /(Z)T) = (1 - Y)f(Z)T - Ye~ f W T . 

E{l(Y, /(Z)T)|Z, T = 1} = {1 - m 1 (Z)}/(Z) - m 1 (Z)e^ z ), 
E{[(F, /(Z)T)|Z, T = —1} = -{1 - m_!(Z)}/(Z) - m_ 1 (Z)e^. 
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Thus 



£(/) =E{l(Y,f(Z)T)} 
"1 



=Ez 
=E Z 



2 E y {/(F, /(Z)T)|Z,T = 1} + -E Y {l(Y : /(Z)T)|Z,T = -1} 
i{mx(Z) - m_x(Z)}/(Z) - ^(Z)^ 2 ) - m_ 1 (Z)e^ 



Therefore 



= I Ez [{rrn(Z) - m_i(Z)} + m 1 (Z)e^ z ) - m_ 1 (Z)e / ( z )] 



which implies that the minimizer of C(f) 



r (z) = log 



m_i(z) 



for all z G Support of Z. 



Under the Cox working model for survival outcome, we have 
E Y {l(YJ(Z)T)\Z,T} =E Y (jT [T/(Z) - log{E(e^ 2 )/(X > «))}] d7V(u)|Z,T 

= jf [/(Z) - log{E(e T ^ z )/(X > «))}] E {/(X > «)|Z, t} A t ( M ; Z)d- 

where At(tt; Z) is the hazard function for X® given Z for t — 1/ — 1. Since 

A/) = Ez 



iE y {/(F, /(Z)T)|Z,T = 1} + iE y {/(F, /(Z)T)|Z,T = -1} 



=^E^ T | > «)Ai(«; Z) - /(X^ > u)A_i(u; Z) 

- J(*W > u)A(u; /) + e"^ /(X^ 1 * > u)A(«; /) | d«, 



5/ 



where 



Ht; f) 



E[I(X>u){\ 1 (u;Z) + \_ 1 (u;Z)}] 



E{e T /( z )/(X > u)} 
Setting the derivative at zero, the minimizer /*(z) satisfies 

e f(z) E{A*(X (1) )|Z = z} - e^'WElA'^^JlZ = z} 
=Prob(C (1) > X (1) |Z = z) - Prob(C ( - 1} > X ( - X) |Z = z) 
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for all z e Support of Z, where A*(u) = A(u, /*). When censoring rates are the same in two 
arms for all given z, 



/*(*) = -J log 



E{A*{XW)\Z = z} 
£{A*(X(-!))|Z = z} 



6.2 Justification of the optimal ao(z) in the efficient augmentation 

Let S(y, w*,7) be the derivative of the objective function /(2/,7'w*) with respect to 7. 7 is 
the root to an estimating equation 

N 

Qh) = N- 1 J2S(Y l ,W*,j)=0. 
i=i 

Similarly, the augmented estimator j a can be viewed as the root of the estimating equation 

N 

Q a ( 7 ) = N- 1 {S(Yi, W*, 7) - Ti ■ a(Z,)} = 0, 

i=i 

Since -E{Tj • a(Zj)} = due to randomization, the solution of the augmented estimating 
equation always converges to the 7* in probability. It is straightforward to show that 

N 

7 _ j* = N~ 1 A Q 1 Sfr, W*, 7*) + o P (iV- 1 ) 



i=i 



and 



N 



7a - 7* = N- 1 ^ 1 J2i S ^ W h 7*) - T,a(Z,)} + op^ 1 ) 



where Aq is the derivative of E{S(Yi, W*, 7)} with respect to 7 at 7 = 7*. Selecting the 
optimal a(z) is equivalent to minimizing the variance of {S(Yi, W*, 7*) — Tja(Zj)}. Noting 
that 

E [{S(Y h WlY) -T ia (Z,)H = E W*, 7 *) -T l a (Z l )H+E[{a(Z,)-a (Z l )r 2 ], 

where a (z) satisfies the equation 

E [{S(Y, W*, 7*) - Ta (Z)}T^(Z)] = 

for any function a (-) is the optimal augmentation term minimizing the variance of j a . 
Since a (-) is the root of the equation 



E 



{5(F,W*,7*)-Ta (Z)}'T 



0, 
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a (z) = - [E{S(Y, W(z)/2, 7 *)|Z = z,T = 1} - E{S(Y, - W(z)/2, 7 *)|Z = z,T = -1}] . 

For continuous response, 

5(y,YW*) = -^(Z) - ^TW(Z)' 7 j 

and 

a (z) = X - (E[-W(z){y - W(z)V/2}/2|T = 1, Z = z] - E[W(z){F + W(z) V/2}/2|T = 
= - W(z) |^E(y|T = 1, Z = z) + 1e(F|T = -1, Z = z) 
= -iw(z)E(F|Z = z) 



■1,Z 



For binary response, 



I r TW(Z)' T /2 -J 

S(KYW) = --W(Z)r|y - 1 + eTW(Z , 7/2 } 



and 



a (z) = - -W(z) 



E<^r 



e W(z)' 7 */2 
1 _|_ e W(z)' 7 */2 



T=l,Z = z^+E^F 



- -W(z) <^ E(Y\T = 1, Z = z) + E(Y\T = -1, Z = z) - 



e -W(z)' 7 */2 

" 1 + e -W(z)' 7 */2 
e W(z)' 7 */2 



T = —1, Z 

e -W(z)'7*/2 



1 _|_ e W(z)' 7 */2 J + e -W(z)' 7 */2 



= -iw(z){ E (F|Z = z)-i 



For survival response, the estimating equation based on the partial likelihood function is 
asymptotically equivalent to the estimating equation iV" 1 YliLi ^O^u W*, 7 ) = 0, where 



Thus, 



a o(z) = 



W*, 7) = - / [W* - R(u; 7*)] M(du, W*, 7*). 
./o 



^W(z) {G^r; z) + G 2 (r; z)} - jT R(u){Gi(du; z) - G 2 (d«; z)} 
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6.3 Lasso algorithm in the efficient augmentation 

In general, the augmentation term is in the form of ao(Zj) = W(Zj)'r(Zj), where r(Zj) is a 
simple scalar. The lasso regularized objective function can be written as 

1 m, ywn - vwmzo} + a| 7 |. 

i=i 

In general, this lasso problem can be solved iteratively. For example, when /(•) is the log- 
likelihood function of the logistic regression model, then with we may update 7 iteratively 
by solving the standard OLS-lasso problem 

1 N 

-5>^-ywn 2 + A| 7 | 

1=1 

where 7 is the current estimator for 7, 

Zi = 7'W* + wr l {Yi -pi- r(Zi)}, w t = pi{l - pi) 

and 

, exp{yW*} 
Pt l + exp{ 7 'W*}' 
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Figure 3: Boxplots for the correlation coefficients between the estimated score and true treat- 
ment effect with three different methods applied to continuous outcomes. The empty and 
filled boxes represent low and high dimensional (p = 50 and p = 1000 y* cases, respectively. 
Left upper panel: moderate main effect and independent covariates; right upper panel: mod- 
erate main effect and correlated covariates; left lower panel: big main effect and independent 
covariates; right lower panel: big main effect and correlated covariates. 
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Figure 4: Boxplots for the correlation coefficients between the estimated score and true treat- 
ment effect with three different methods applied to binary outcomes. The empty and filled 
boxes represent low and high dimensional (p = 50 and p = 1000 ) cases, respectively. Left 
upper panel: moderate main effect and independent covariates; right upper panel: moder- 
ate main effect and correlated covariates; left lower panel: big main effect and independent 
covariates; right lower panel: big main effect and correlated covariates. 
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True score 



Estimated score 




Figure 5: Left panel: the boxplots for the group differences in Y in subgroups stratified by 
the optimal score; right panel: the boxplots for the group differences in Y in subgroups strat- 
ified by posterior probability based on the independently trained nearest shrunken centroid 
classifier. 
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Figure 6: Boxplots for the correlation coefficients between the estimated score and true treat- 
ment effect with three different methods applied to survival outcomes. The empty and filled 
boxes represent low and high dimensional (p = 50 and p = 1000 ) cases, respectively. Left 
upper panel: moderate main effect and independent covariates; right upper panel: moder- 
ate main effect and correlated covariates; left lower panel: big main effect and independent 
covariates; right lower panel: big main effect and correlated covariates. 
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days 



Figure 7: Survival functions of the Tamoxifen and alternative treatment groups in 393 breast 
cancer patients, red line, Tamoxifen treatment group; black line, alternative treatment group 
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Figure 8: Survival functions of the Tamoxifen and alternative treatment groups stratified 
by the interaction score in the test sets: red line, Tamoxifen treatment group; black line, 
alternative treatment group. Upper panels: the score based on the "new" method; middle 
panels: the score based on "new/augmentated" method; lower panel: the score is based on 
"full regression" method. 
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Figure 9: Boxplots for differences in log(hazard ratio) between high and low risk groups based 
on 100 random splitting on GSE6532. The big positive number represents high quality of 
the constructed score in stratifying patients according to individualized treatment effect. 
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